Titanic Survival Analysis:

First steps:

First, we need to import all the libraries needed for the analysis and load the data file:



In [1]:

    
# Import the libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import scipy



In [2]:

    
# Read the csv file
titanic = pd.read_csv("titanic-data.csv")

The next step is to explore the dataset:



In [3]:

    
titanic.shape









    Out[3]:





(891, 12)



In [4]:

    
titanic.columns









    Out[4]:





Index([u'PassengerId', u'Survived', u'Pclass', u'Name', u'Sex', u'Age',
       u'SibSp', u'Parch', u'Ticket', u'Fare', u'Cabin', u'Embarked'],
      dtype='object')



In [5]:

    
titanic









    Out[5]:






  
    
      
      PassengerId
      Survived
      Pclass
      Name
      Sex
      Age
      SibSp
      Parch
      Ticket
      Fare
      Cabin
      Embarked
    
  
  
    
      0
      1
      0
      3
      Braund, Mr. Owen Harris
      male
      22.0
      1
      0
      A/5 21171
      7.2500
      NaN
      S
    
    
      1
      2
      1
      1
      Cumings, Mrs. John Bradley (Florence Briggs Th...
      female
      38.0
      1
      0
      PC 17599
      71.2833
      C85
      C
    
    
      2
      3
      1
      3
      Heikkinen, Miss. Laina
      female
      26.0
      0
      0
      STON/O2. 3101282
      7.9250
      NaN
      S
    
    
      3
      4
      1
      1
      Futrelle, Mrs. Jacques Heath (Lily May Peel)
      female
      35.0
      1
      0
      113803
      53.1000
      C123
      S
    
    
      4
      5
      0
      3
      Allen, Mr. William Henry
      male
      35.0
      0
      0
      373450
      8.0500
      NaN
      S
    
    
      5
      6
      0
      3
      Moran, Mr. James
      male
      NaN
      0
      0
      330877
      8.4583
      NaN
      Q
    
    
      6
      7
      0
      1
      McCarthy, Mr. Timothy J
      male
      54.0
      0
      0
      17463
      51.8625
      E46
      S
    
    
      7
      8
      0
      3
      Palsson, Master. Gosta Leonard
      male
      2.0
      3
      1
      349909
      21.0750
      NaN
      S
    
    
      8
      9
      1
      3
      Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)
      female
      27.0
      0
      2
      347742
      11.1333
      NaN
      S
    
    
      9
      10
      1
      2
      Nasser, Mrs. Nicholas (Adele Achem)
      female
      14.0
      1
      0
      237736
      30.0708
      NaN
      C
    
    
      10
      11
      1
      3
      Sandstrom, Miss. Marguerite Rut
      female
      4.0
      1
      1
      PP 9549
      16.7000
      G6
      S
    
    
      11
      12
      1
      1
      Bonnell, Miss. Elizabeth
      female
      58.0
      0
      0
      113783
      26.5500
      C103
      S
    
    
      12
      13
      0
      3
      Saundercock, Mr. William Henry
      male
      20.0
      0
      0
      A/5. 2151
      8.0500
      NaN
      S
    
    
      13
      14
      0
      3
      Andersson, Mr. Anders Johan
      male
      39.0
      1
      5
      347082
      31.2750
      NaN
      S
    
    
      14
      15
      0
      3
      Vestrom, Miss. Hulda Amanda Adolfina
      female
      14.0
      0
      0
      350406
      7.8542
      NaN
      S
    
    
      15
      16
      1
      2
      Hewlett, Mrs. (Mary D Kingcome)
      female
      55.0
      0
      0
      248706
      16.0000
      NaN
      S
    
    
      16
      17
      0
      3
      Rice, Master. Eugene
      male
      2.0
      4
      1
      382652
      29.1250
      NaN
      Q
    
    
      17
      18
      1
      2
      Williams, Mr. Charles Eugene
      male
      NaN
      0
      0
      244373
      13.0000
      NaN
      S
    
    
      18
      19
      0
      3
      Vander Planke, Mrs. Julius (Emelia Maria Vande...
      female
      31.0
      1
      0
      345763
      18.0000
      NaN
      S
    
    
      19
      20
      1
      3
      Masselmani, Mrs. Fatima
      female
      NaN
      0
      0
      2649
      7.2250
      NaN
      C
    
    
      20
      21
      0
      2
      Fynney, Mr. Joseph J
      male
      35.0
      0
      0
      239865
      26.0000
      NaN
      S
    
    
      21
      22
      1
      2
      Beesley, Mr. Lawrence
      male
      34.0
      0
      0
      248698
      13.0000
      D56
      S
    
    
      22
      23
      1
      3
      McGowan, Miss. Anna "Annie"
      female
      15.0
      0
      0
      330923
      8.0292
      NaN
      Q
    
    
      23
      24
      1
      1
      Sloper, Mr. William Thompson
      male
      28.0
      0
      0
      113788
      35.5000
      A6
      S
    
    
      24
      25
      0
      3
      Palsson, Miss. Torborg Danira
      female
      8.0
      3
      1
      349909
      21.0750
      NaN
      S
    
    
      25
      26
      1
      3
      Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...
      female
      38.0
      1
      5
      347077
      31.3875
      NaN
      S
    
    
      26
      27
      0
      3
      Emir, Mr. Farred Chehab
      male
      NaN
      0
      0
      2631
      7.2250
      NaN
      C
    
    
      27
      28
      0
      1
      Fortune, Mr. Charles Alexander
      male
      19.0
      3
      2
      19950
      263.0000
      C23 C25 C27
      S
    
    
      28
      29
      1
      3
      O'Dwyer, Miss. Ellen "Nellie"
      female
      NaN
      0
      0
      330959
      7.8792
      NaN
      Q
    
    
      29
      30
      0
      3
      Todoroff, Mr. Lalio
      male
      NaN
      0
      0
      349216
      7.8958
      NaN
      S
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      861
      862
      0
      2
      Giles, Mr. Frederick Edward
      male
      21.0
      1
      0
      28134
      11.5000
      NaN
      S
    
    
      862
      863
      1
      1
      Swift, Mrs. Frederick Joel (Margaret Welles Ba...
      female
      48.0
      0
      0
      17466
      25.9292
      D17
      S
    
    
      863
      864
      0
      3
      Sage, Miss. Dorothy Edith "Dolly"
      female
      NaN
      8
      2
      CA. 2343
      69.5500
      NaN
      S
    
    
      864
      865
      0
      2
      Gill, Mr. John William
      male
      24.0
      0
      0
      233866
      13.0000
      NaN
      S
    
    
      865
      866
      1
      2
      Bystrom, Mrs. (Karolina)
      female
      42.0
      0
      0
      236852
      13.0000
      NaN
      S
    
    
      866
      867
      1
      2
      Duran y More, Miss. Asuncion
      female
      27.0
      1
      0
      SC/PARIS 2149
      13.8583
      NaN
      C
    
    
      867
      868
      0
      1
      Roebling, Mr. Washington Augustus II
      male
      31.0
      0
      0
      PC 17590
      50.4958
      A24
      S
    
    
      868
      869
      0
      3
      van Melkebeke, Mr. Philemon
      male
      NaN
      0
      0
      345777
      9.5000
      NaN
      S
    
    
      869
      870
      1
      3
      Johnson, Master. Harold Theodor
      male
      4.0
      1
      1
      347742
      11.1333
      NaN
      S
    
    
      870
      871
      0
      3
      Balkic, Mr. Cerin
      male
      26.0
      0
      0
      349248
      7.8958
      NaN
      S
    
    
      871
      872
      1
      1
      Beckwith, Mrs. Richard Leonard (Sallie Monypeny)
      female
      47.0
      1
      1
      11751
      52.5542
      D35
      S
    
    
      872
      873
      0
      1
      Carlsson, Mr. Frans Olof
      male
      33.0
      0
      0
      695
      5.0000
      B51 B53 B55
      S
    
    
      873
      874
      0
      3
      Vander Cruyssen, Mr. Victor
      male
      47.0
      0
      0
      345765
      9.0000
      NaN
      S
    
    
      874
      875
      1
      2
      Abelson, Mrs. Samuel (Hannah Wizosky)
      female
      28.0
      1
      0
      P/PP 3381
      24.0000
      NaN
      C
    
    
      875
      876
      1
      3
      Najib, Miss. Adele Kiamie "Jane"
      female
      15.0
      0
      0
      2667
      7.2250
      NaN
      C
    
    
      876
      877
      0
      3
      Gustafsson, Mr. Alfred Ossian
      male
      20.0
      0
      0
      7534
      9.8458
      NaN
      S
    
    
      877
      878
      0
      3
      Petroff, Mr. Nedelio
      male
      19.0
      0
      0
      349212
      7.8958
      NaN
      S
    
    
      878
      879
      0
      3
      Laleff, Mr. Kristo
      male
      NaN
      0
      0
      349217
      7.8958
      NaN
      S
    
    
      879
      880
      1
      1
      Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)
      female
      56.0
      0
      1
      11767
      83.1583
      C50
      C
    
    
      880
      881
      1
      2
      Shelley, Mrs. William (Imanita Parrish Hall)
      female
      25.0
      0
      1
      230433
      26.0000
      NaN
      S
    
    
      881
      882
      0
      3
      Markun, Mr. Johann
      male
      33.0
      0
      0
      349257
      7.8958
      NaN
      S
    
    
      882
      883
      0
      3
      Dahlberg, Miss. Gerda Ulrika
      female
      22.0
      0
      0
      7552
      10.5167
      NaN
      S
    
    
      883
      884
      0
      2
      Banfield, Mr. Frederick James
      male
      28.0
      0
      0
      C.A./SOTON 34068
      10.5000
      NaN
      S
    
    
      884
      885
      0
      3
      Sutehall, Mr. Henry Jr
      male
      25.0
      0
      0
      SOTON/OQ 392076
      7.0500
      NaN
      S
    
    
      885
      886
      0
      3
      Rice, Mrs. William (Margaret Norton)
      female
      39.0
      0
      5
      382652
      29.1250
      NaN
      Q
    
    
      886
      887
      0
      2
      Montvila, Rev. Juozas
      male
      27.0
      0
      0
      211536
      13.0000
      NaN
      S
    
    
      887
      888
      1
      1
      Graham, Miss. Margaret Edith
      female
      19.0
      0
      0
      112053
      30.0000
      B42
      S
    
    
      888
      889
      0
      3
      Johnston, Miss. Catherine Helen "Carrie"
      female
      NaN
      1
      2
      W./C. 6607
      23.4500
      NaN
      S
    
    
      889
      890
      1
      1
      Behr, Mr. Karl Howell
      male
      26.0
      0
      0
      111369
      30.0000
      C148
      C
    
    
      890
      891
      0
      3
      Dooley, Mr. Patrick
      male
      32.0
      0
      0
      370376
      7.7500
      NaN
      Q
    
  

891 rows × 12 columns

We can see that Passenger ID, Name and Cabin have little value to the analysis, so we drop these columns off the dataset:



In [6]:

    
titanic = titanic.drop(['PassengerId','Name','Ticket', 'Cabin', 'Embarked'], axis=1)



In [7]:

    
titanic['Survived'].describe()









    Out[7]:





count    891.000000
mean       0.383838
std        0.486592
min        0.000000
25%        0.000000
50%        0.000000
75%        1.000000
max        1.000000
Name: Survived, dtype: float64

Data cleaning:

We can see that both the Age column has a lot of NAs. We would need to fill in the blank with random values generated within their standardized value.



In [8]:

    
titanic['Age'].describe()









    Out[8]:





count    714.000000
mean      29.699118
std       14.526497
min        0.420000
25%       20.125000
50%       28.000000
75%       38.000000
max       80.000000
Name: Age, dtype: float64



In [9]:

    
average_age = titanic["Age"].mean()
std_age = titanic["Age"].std()
count_nan_age = titanic["Age"].isnull().sum()
# generate random numbers between (mean - std) & (mean + std)
rand = np.random.randint(average_age - std_age, average_age + std_age, size = count_nan_age)



In [10]:

    
# Fill NAs in age with median age
titanic['Age'][np.isnan(titanic["Age"])] = rand









    



C:\ProgramData\Anaconda2\lib\site-packages\ipykernel\__main__.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app



In [11]:

    
titanic['Age'].describe()









    Out[11]:





count    891.000000
mean      29.538911
std       13.523755
min        0.420000
25%       21.000000
50%       28.000000
75%       38.000000
max       80.000000
Name: Age, dtype: float64



In [12]:

    
sns.distplot(titanic['Age'])
plt.show()

Someone's family size would be equal to their number of spouses/siblings and parents/children on the ship, plus themselves:



In [13]:

    
# Family size
titanic['Family_size'] = titanic['SibSp'] + titanic['Parch'] + 1

Now we would extract the survived dataset for future analysis:



In [14]:

    
survived = titanic[titanic['Survived'] == 1]

Questions:

According to Wikipedia, "Women and children first" is a code of conduct dating from 1860, whereby the lives of women and children were to be saved first in a life-threatening situation, typically abandoning ship, when survival resources such as lifeboats were limited. The wiki page actually gives some insights and statistics on the survival rate of the Titanic; however, in this analysis, I would reconfirm them, and attempt to find out which other factors that determine the survival rate in the Titanic tragedy.

The questions I am going to answer in this analysis are:

Was there really a "Women and children first" rule on the Titanic?
Did other factors such as wealth/classes and family sizes affect someone's chance of survival?

Women and children first?

Assuming people are neutral on the gender of a kid, I would split the passengers into 3 types:



In [15]:

    
def passenger_type(person):
    if person['Age'] <= 16:
        return "child"
    elif person['Sex'] == "female":
        return "female_adult"
    else:
        return "male_adult"

titanic['Type'] = titanic.apply(passenger_type, axis = 1)
titanic









    Out[15]:






  
    
      
      Survived
      Pclass
      Sex
      Age
      SibSp
      Parch
      Fare
      Family_size
      Type
    
  
  
    
      0
      0
      3
      male
      22.0
      1
      0
      7.2500
      2
      male_adult
    
    
      1
      1
      1
      female
      38.0
      1
      0
      71.2833
      2
      female_adult
    
    
      2
      1
      3
      female
      26.0
      0
      0
      7.9250
      1
      female_adult
    
    
      3
      1
      1
      female
      35.0
      1
      0
      53.1000
      2
      female_adult
    
    
      4
      0
      3
      male
      35.0
      0
      0
      8.0500
      1
      male_adult
    
    
      5
      0
      3
      male
      16.0
      0
      0
      8.4583
      1
      child
    
    
      6
      0
      1
      male
      54.0
      0
      0
      51.8625
      1
      male_adult
    
    
      7
      0
      3
      male
      2.0
      3
      1
      21.0750
      5
      child
    
    
      8
      1
      3
      female
      27.0
      0
      2
      11.1333
      3
      female_adult
    
    
      9
      1
      2
      female
      14.0
      1
      0
      30.0708
      2
      child
    
    
      10
      1
      3
      female
      4.0
      1
      1
      16.7000
      3
      child
    
    
      11
      1
      1
      female
      58.0
      0
      0
      26.5500
      1
      female_adult
    
    
      12
      0
      3
      male
      20.0
      0
      0
      8.0500
      1
      male_adult
    
    
      13
      0
      3
      male
      39.0
      1
      5
      31.2750
      7
      male_adult
    
    
      14
      0
      3
      female
      14.0
      0
      0
      7.8542
      1
      child
    
    
      15
      1
      2
      female
      55.0
      0
      0
      16.0000
      1
      female_adult
    
    
      16
      0
      3
      male
      2.0
      4
      1
      29.1250
      6
      child
    
    
      17
      1
      2
      male
      18.0
      0
      0
      13.0000
      1
      male_adult
    
    
      18
      0
      3
      female
      31.0
      1
      0
      18.0000
      2
      female_adult
    
    
      19
      1
      3
      female
      34.0
      0
      0
      7.2250
      1
      female_adult
    
    
      20
      0
      2
      male
      35.0
      0
      0
      26.0000
      1
      male_adult
    
    
      21
      1
      2
      male
      34.0
      0
      0
      13.0000
      1
      male_adult
    
    
      22
      1
      3
      female
      15.0
      0
      0
      8.0292
      1
      child
    
    
      23
      1
      1
      male
      28.0
      0
      0
      35.5000
      1
      male_adult
    
    
      24
      0
      3
      female
      8.0
      3
      1
      21.0750
      5
      child
    
    
      25
      1
      3
      female
      38.0
      1
      5
      31.3875
      7
      female_adult
    
    
      26
      0
      3
      male
      24.0
      0
      0
      7.2250
      1
      male_adult
    
    
      27
      0
      1
      male
      19.0
      3
      2
      263.0000
      6
      male_adult
    
    
      28
      1
      3
      female
      25.0
      0
      0
      7.8792
      1
      female_adult
    
    
      29
      0
      3
      male
      31.0
      0
      0
      7.8958
      1
      male_adult
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      861
      0
      2
      male
      21.0
      1
      0
      11.5000
      2
      male_adult
    
    
      862
      1
      1
      female
      48.0
      0
      0
      25.9292
      1
      female_adult
    
    
      863
      0
      3
      female
      41.0
      8
      2
      69.5500
      11
      female_adult
    
    
      864
      0
      2
      male
      24.0
      0
      0
      13.0000
      1
      male_adult
    
    
      865
      1
      2
      female
      42.0
      0
      0
      13.0000
      1
      female_adult
    
    
      866
      1
      2
      female
      27.0
      1
      0
      13.8583
      2
      female_adult
    
    
      867
      0
      1
      male
      31.0
      0
      0
      50.4958
      1
      male_adult
    
    
      868
      0
      3
      male
      28.0
      0
      0
      9.5000
      1
      male_adult
    
    
      869
      1
      3
      male
      4.0
      1
      1
      11.1333
      3
      child
    
    
      870
      0
      3
      male
      26.0
      0
      0
      7.8958
      1
      male_adult
    
    
      871
      1
      1
      female
      47.0
      1
      1
      52.5542
      3
      female_adult
    
    
      872
      0
      1
      male
      33.0
      0
      0
      5.0000
      1
      male_adult
    
    
      873
      0
      3
      male
      47.0
      0
      0
      9.0000
      1
      male_adult
    
    
      874
      1
      2
      female
      28.0
      1
      0
      24.0000
      2
      female_adult
    
    
      875
      1
      3
      female
      15.0
      0
      0
      7.2250
      1
      child
    
    
      876
      0
      3
      male
      20.0
      0
      0
      9.8458
      1
      male_adult
    
    
      877
      0
      3
      male
      19.0
      0
      0
      7.8958
      1
      male_adult
    
    
      878
      0
      3
      male
      38.0
      0
      0
      7.8958
      1
      male_adult
    
    
      879
      1
      1
      female
      56.0
      0
      1
      83.1583
      2
      female_adult
    
    
      880
      1
      2
      female
      25.0
      0
      1
      26.0000
      2
      female_adult
    
    
      881
      0
      3
      male
      33.0
      0
      0
      7.8958
      1
      male_adult
    
    
      882
      0
      3
      female
      22.0
      0
      0
      10.5167
      1
      female_adult
    
    
      883
      0
      2
      male
      28.0
      0
      0
      10.5000
      1
      male_adult
    
    
      884
      0
      3
      male
      25.0
      0
      0
      7.0500
      1
      male_adult
    
    
      885
      0
      3
      female
      39.0
      0
      5
      29.1250
      6
      female_adult
    
    
      886
      0
      2
      male
      27.0
      0
      0
      13.0000
      1
      male_adult
    
    
      887
      1
      1
      female
      19.0
      0
      0
      30.0000
      1
      female_adult
    
    
      888
      0
      3
      female
      19.0
      1
      2
      23.4500
      4
      female_adult
    
    
      889
      1
      1
      male
      26.0
      0
      0
      30.0000
      1
      male_adult
    
    
      890
      0
      3
      male
      32.0
      0
      0
      7.7500
      1
      male_adult
    
  

891 rows × 9 columns



In [16]:

    
titanic['Type'].value_counts()









    Out[16]:





male_adult      517
female_adult    263
child           111
Name: Type, dtype: int64



In [17]:

    
sns.set(style="darkgrid")
ax = sns.countplot(x="Type", data = titanic)
plt.show()

We can see that male adults are the initial largest type of people on the ship, followed by female adults and child.

Now looking into the survival rate:



In [18]:

    
survived = titanic[titanic['Survived'] == 1]
non_survived = titanic[titanic['Survived'] == 0]



In [19]:

    
survived['Type'].value_counts()









    Out[19]:





female_adult    199
male_adult       87
child            56
Name: Type, dtype: int64



In [20]:

    
non_survived['Type'].value_counts()









    Out[20]:





male_adult      430
female_adult     64
child            55
Name: Type, dtype: int64



In [21]:

    
sns.set(style="darkgrid")
ax = sns.countplot(x="Survived", hue = "Type", data = titanic)
plt.show()

Comparing to the initial number of people of each type, we can see that children have more than 50% survival rate, female adults have an impressive survival rate around 75%, while male adults have a small survival rate of around 16% comparing to their intial numbers. So we can see that there was an inherent "women and children first" code when it came to saving people on the ship.



In [22]:

    
sns.distplot(survived['Age'])
plt.show()

The histogram of the age distribution of the survival group also confirms that younger people had a higher advantage in survival comparing to older ages.

Socio-economic classes:

We can assume that someone's class on the Titanic represented their socio-economic status. Also, we would assume that the fares have a direct correlation with the classes; so we only need to examine one of them.



In [23]:

    
titanic['Pclass'].value_counts()









    Out[23]:





3    491
1    216
2    184
Name: Pclass, dtype: int64



In [24]:

    
sns.set(style="darkgrid")
ax = sns.countplot(x = "Pclass", data = titanic)
plt.show()

Approximately 55% of the passengers belonged to the third class, while the rest of the ship belong to the first and second classes. Now we'll see if the first and second class passengers also paid a premimum when it comes to safety?



In [25]:

    
survived['Pclass'].value_counts()









    Out[25]:





1    136
3    119
2     87
Name: Pclass, dtype: int64



In [26]:

    
sns.set(style="darkgrid")
ax = sns.countplot(x = "Pclass", hue = "Survived", data = titanic)
plt.show()

The survival rate of the first class passengers was more than 60%, while the survival rate of the third class ones was merely around 25%. So we can see that there was a bias on weathiness and soci-economic statuses, even in life-threatning situations.

Now, what if we factor in both passenger classes and types (male, female or children), which would have more weight in survival rate?



In [27]:

    
titanic.groupby(['Pclass', 'Type']).Type.count()









    Out[27]:





Pclass  Type        
1       child            12
        female_adult     88
        male_adult      116
2       child            21
        female_adult     66
        male_adult       97
3       child            78
        female_adult    109
        male_adult      304
Name: Type, dtype: int64



In [28]:

    
sns.set(style="darkgrid")
ax = sns.countplot(x = "Pclass", hue = "Type", data = titanic)
plt.show()



In [29]:

    
titanic.groupby(['Pclass', 'Type']).agg({'Survived': 'sum'})









    Out[29]:






  
    
      
      
      Survived
    
    
      Pclass
      Type
      
    
  
  
    
      1
      child
      8
    
    
      female_adult
      86
    
    
      male_adult
      42
    
    
      2
      child
      19
    
    
      female_adult
      60
    
    
      male_adult
      8
    
    
      3
      child
      29
    
    
      female_adult
      53
    
    
      male_adult
      37



In [30]:

    
sns.set(style="darkgrid")
ax = sns.countplot(x = "Pclass", hue = "Type", data = survived)
plt.show()

We can see the women and children of the first class had a significantly impressive survival rate (more than 90% and 80% respectively), when the women and children of the third class had a much lower survival rate (more than 45% and around 40% respectively). However, the women and children from the third class did have a higher survival rate than the men from higher classes. Men from the first class had a survival rate of around 35%, which was actually below the overall survival rate of 38.38%. Men from the second and third classes suffered very low survival rates, which was around 8 % and around 12 % respectively comparing to their initial numbers.

Family size:

Did people have a higher chance of survival if they traveled with family rather than traveling alone? We'll find out.



In [31]:

    
titanic['Family_size'].value_counts()









    Out[31]:





1     537
2     161
3     102
4      29
6      22
5      15
7      12
11      7
8       6
Name: Family_size, dtype: int64

We can see that the majority of the ship traveled by themselves, followed by families of 2 or 3. The families that had more than 3 members made up a small part of the ship. Now look into the survival statistics:



In [32]:

    
survived['Family_size'].value_counts()









    Out[32]:





1    163
2     89
3     59
4     21
7      4
6      3
5      3
Name: Family_size, dtype: int64



In [33]:

    
sns.boxplot(x="Survived", y="Family_size", data=titanic)
plt.show()



In [34]:

    
sns.kdeplot(survived['Family_size'], shade=True)
plt.show()

Both the boxplot and the distribution curve shows that small-sizing families (under 4) made up around 75% of the survivals. Big families seem to have been penalized harshly on survival rate.

Conclusion:

In this analysis, we can see that there was a clear trend of "Women and children first" when it came to helping and rescuing people from the Titanic. The data also suggests an impact of soci-economic classes and family sizes on someone's chance of survival, although they didn't have as much impact as the "women and children first" rule. Also, women and children from lower classes still had a better chance of survival than men from lower classes.

The analysis has a few limitations. A lot of values were missing in the age sections, and randomized numbers must create a margin of error in the analysis. If I have more knowledge to make use of variables such as names, embarked or cabins, the analysis would also be improved for the better.

	PassengerId	Survived	Pclass	Name	Sex	Age	SibSp	Parch	Ticket	Fare	Cabin	Embarked
0	1	0	3	Braund, Mr. Owen Harris	male	22.0	1	0	A/5 21171	7.2500	NaN	S
1	2	1	1	Cumings, Mrs. John Bradley (Florence Briggs Th...	female	38.0	1	0	PC 17599	71.2833	C85	C
2	3	1	3	Heikkinen, Miss. Laina	female	26.0	0	0	STON/O2. 3101282	7.9250	NaN	S
3	4	1	1	Futrelle, Mrs. Jacques Heath (Lily May Peel)	female	35.0	1	0	113803	53.1000	C123	S
4	5	0	3	Allen, Mr. William Henry	male	35.0	0	0	373450	8.0500	NaN	S
5	6	0	3	Moran, Mr. James	male	NaN	0	0	330877	8.4583	NaN	Q
6	7	0	1	McCarthy, Mr. Timothy J	male	54.0	0	0	17463	51.8625	E46	S
7	8	0	3	Palsson, Master. Gosta Leonard	male	2.0	3	1	349909	21.0750	NaN	S
8	9	1	3	Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)	female	27.0	0	2	347742	11.1333	NaN	S
9	10	1	2	Nasser, Mrs. Nicholas (Adele Achem)	female	14.0	1	0	237736	30.0708	NaN	C
10	11	1	3	Sandstrom, Miss. Marguerite Rut	female	4.0	1	1	PP 9549	16.7000	G6	S
11	12	1	1	Bonnell, Miss. Elizabeth	female	58.0	0	0	113783	26.5500	C103	S
12	13	0	3	Saundercock, Mr. William Henry	male	20.0	0	0	A/5. 2151	8.0500	NaN	S
13	14	0	3	Andersson, Mr. Anders Johan	male	39.0	1	5	347082	31.2750	NaN	S
14	15	0	3	Vestrom, Miss. Hulda Amanda Adolfina	female	14.0	0	0	350406	7.8542	NaN	S
15	16	1	2	Hewlett, Mrs. (Mary D Kingcome)	female	55.0	0	0	248706	16.0000	NaN	S
16	17	0	3	Rice, Master. Eugene	male	2.0	4	1	382652	29.1250	NaN	Q
17	18	1	2	Williams, Mr. Charles Eugene	male	NaN	0	0	244373	13.0000	NaN	S
18	19	0	3	Vander Planke, Mrs. Julius (Emelia Maria Vande...	female	31.0	1	0	345763	18.0000	NaN	S
19	20	1	3	Masselmani, Mrs. Fatima	female	NaN	0	0	2649	7.2250	NaN	C
20	21	0	2	Fynney, Mr. Joseph J	male	35.0	0	0	239865	26.0000	NaN	S
21	22	1	2	Beesley, Mr. Lawrence	male	34.0	0	0	248698	13.0000	D56	S
22	23	1	3	McGowan, Miss. Anna "Annie"	female	15.0	0	0	330923	8.0292	NaN	Q
23	24	1	1	Sloper, Mr. William Thompson	male	28.0	0	0	113788	35.5000	A6	S
24	25	0	3	Palsson, Miss. Torborg Danira	female	8.0	3	1	349909	21.0750	NaN	S
25	26	1	3	Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...	female	38.0	1	5	347077	31.3875	NaN	S
26	27	0	3	Emir, Mr. Farred Chehab	male	NaN	0	0	2631	7.2250	NaN	C
27	28	0	1	Fortune, Mr. Charles Alexander	male	19.0	3	2	19950	263.0000	C23 C25 C27	S
28	29	1	3	O'Dwyer, Miss. Ellen "Nellie"	female	NaN	0	0	330959	7.8792	NaN	Q
29	30	0	3	Todoroff, Mr. Lalio	male	NaN	0	0	349216	7.8958	NaN	S
...	...	...	...	...	...	...	...	...	...	...	...	...
861	862	0	2	Giles, Mr. Frederick Edward	male	21.0	1	0	28134	11.5000	NaN	S
862	863	1	1	Swift, Mrs. Frederick Joel (Margaret Welles Ba...	female	48.0	0	0	17466	25.9292	D17	S
863	864	0	3	Sage, Miss. Dorothy Edith "Dolly"	female	NaN	8	2	CA. 2343	69.5500	NaN	S
864	865	0	2	Gill, Mr. John William	male	24.0	0	0	233866	13.0000	NaN	S
865	866	1	2	Bystrom, Mrs. (Karolina)	female	42.0	0	0	236852	13.0000	NaN	S
866	867	1	2	Duran y More, Miss. Asuncion	female	27.0	1	0	SC/PARIS 2149	13.8583	NaN	C
867	868	0	1	Roebling, Mr. Washington Augustus II	male	31.0	0	0	PC 17590	50.4958	A24	S
868	869	0	3	van Melkebeke, Mr. Philemon	male	NaN	0	0	345777	9.5000	NaN	S
869	870	1	3	Johnson, Master. Harold Theodor	male	4.0	1	1	347742	11.1333	NaN	S
870	871	0	3	Balkic, Mr. Cerin	male	26.0	0	0	349248	7.8958	NaN	S
871	872	1	1	Beckwith, Mrs. Richard Leonard (Sallie Monypeny)	female	47.0	1	1	11751	52.5542	D35	S
872	873	0	1	Carlsson, Mr. Frans Olof	male	33.0	0	0	695	5.0000	B51 B53 B55	S
873	874	0	3	Vander Cruyssen, Mr. Victor	male	47.0	0	0	345765	9.0000	NaN	S
874	875	1	2	Abelson, Mrs. Samuel (Hannah Wizosky)	female	28.0	1	0	P/PP 3381	24.0000	NaN	C
875	876	1	3	Najib, Miss. Adele Kiamie "Jane"	female	15.0	0	0	2667	7.2250	NaN	C
876	877	0	3	Gustafsson, Mr. Alfred Ossian	male	20.0	0	0	7534	9.8458	NaN	S
877	878	0	3	Petroff, Mr. Nedelio	male	19.0	0	0	349212	7.8958	NaN	S
878	879	0	3	Laleff, Mr. Kristo	male	NaN	0	0	349217	7.8958	NaN	S
879	880	1	1	Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)	female	56.0	0	1	11767	83.1583	C50	C
880	881	1	2	Shelley, Mrs. William (Imanita Parrish Hall)	female	25.0	0	1	230433	26.0000	NaN	S
881	882	0	3	Markun, Mr. Johann	male	33.0	0	0	349257	7.8958	NaN	S
882	883	0	3	Dahlberg, Miss. Gerda Ulrika	female	22.0	0	0	7552	10.5167	NaN	S
883	884	0	2	Banfield, Mr. Frederick James	male	28.0	0	0	C.A./SOTON 34068	10.5000	NaN	S
884	885	0	3	Sutehall, Mr. Henry Jr	male	25.0	0	0	SOTON/OQ 392076	7.0500	NaN	S
885	886	0	3	Rice, Mrs. William (Margaret Norton)	female	39.0	0	5	382652	29.1250	NaN	Q
886	887	0	2	Montvila, Rev. Juozas	male	27.0	0	0	211536	13.0000	NaN	S
887	888	1	1	Graham, Miss. Margaret Edith	female	19.0	0	0	112053	30.0000	B42	S
888	889	0	3	Johnston, Miss. Catherine Helen "Carrie"	female	NaN	1	2	W./C. 6607	23.4500	NaN	S
889	890	1	1	Behr, Mr. Karl Howell	male	26.0	0	0	111369	30.0000	C148	C
890	891	0	3	Dooley, Mr. Patrick	male	32.0	0	0	370376	7.7500	NaN	Q

		Survived
Pclass	Type
1	child	8
	female_adult	86
	male_adult	42
2	child	19
	female_adult	60
	male_adult	8
3	child	29
	female_adult	53
	male_adult	37